Library catalogues and open quantification of knowledge production 1470-1800

Mikko Tolonen, Leo Lahti

June 11, 2015

Open analytical ecosystems for digital humanities

Open science principles

Library catalogues: the data

https://github.com/rOpenGov/estc

datanewbacon

ecosystem

Publishing “history” in Britain and North America 1470-1800

Research questions

  1. Who wrote history?
  2. Where was it published?
  3. How does the publishing of history change over the early modern period?

ESTC raw data

Hierarchical information, only some fields relevant for our study

estcraw

Workflow, step by step

workflow

Load the data and tools

Load the data and tools in R:

load("df.RData")
library(bibliographica)
kable(t(df.orig[22495, ]))
22495
008..partial-Language English
100.a-Author Gauden, John,
100.d-Author, dates 1605-1662,
100.d..partial-Author, birth 1605
100.d..partial.1-Author, death 1662
240.n-Part/section of a work NA
245.a-Title Eikōn basilikē
260.a-Place of publication [London] :
260.b-Publisher Reprinted in Regis memoriam, for John Williams,
260.b..partial-Printed for John Williams
260.c-Publication date 1649.
260.c..partial-Publication date, clean 1649
300.a-Extent [8], 175, [9] p., [2] leaves of plates :
300.c-Dimensions 10 cm (12⁰)
650.a-Subject NA
650.y.651.y-Chronological subdivision Civil War, 1642-1649;Civil War, 1642-1649.
650.z.651.a.651.z-Geographic name and subdivision Great Britain
65..series-Additional years ;;;;1642;1649
-NA NA

Polishing page counts

Raw page counts

rawpages <- as.character(unique(df.orig[sample(nrow(df.orig), 6), "300.a-Extent"]))
kable(rawpages)
12p. ;
[24], 192 p., [2] leaves of plates (1 folded) :
7, [1] p. ;
1 sheet ;
1 sheet ([1]) p. ;
[8] p. ;

Polish page counts

polish_pages(rawpages)$total.pages
## [1]  12 220   8   2   2   8

Document dimension field

kable(as.character(sample(unique(df.orig[, "300.c-Dimensions"]), 6)))
50 x 40 cm.
49-50 cm. (2⁰)
24 x 20 cm.
50 cm (2⁰)
20 cm. (8⁰)
53 x 33 cm.

Polish document dimensions

Pick dimension information

kable(polish_dimensions("10 cm (12⁰)"))
original gatherings width height
10 cm (12⁰) 12to NA 10

Fill missing dimensions

Estimate missing dimensions

kable(polish_dimensions("10 cm (12⁰)", fill = TRUE))
original gatherings width height area
10 cm (12⁰) 12to 10 15 150

Publication place

Many versions of London:

x <- as.character(df.orig[, "260.a-Place of publication"])
top_plot(x[grep("London", x)], ntop = 20)

In total 374 unique places with the string London - tidying up and synonyme lists !

Ambiguous authors

Author gender

Enriching data by external information

as.matrix(get_gender(polish_author(sample(unique(df$author.name), 20))$first)$gender)
##           [,1]    
## samuel    "male"  
## richard   "male"  
## william   "male"  
## gamaliel  "male"  
## robert    "male"  
## charles   "male"  
## john      "male"  
## prudencio "male"  
## thomas    "male"  
## hyde      "male"  
## thomas    "male"  
## mary      "female"
## thomas    "male"  
## zachary   "male"  
## henry     "male"  
## thomas    "male"  
## william   "male"  
## robert    "male"  
## n         NA      
## charles   "male"

Workflow

workflow

Who wrote history?

Who wrote history?

Top-10 authors (number of titles)

top_plot(df, "author.unique", 20)

Who wrote history?

Top-10 female authors (number of titles)

Who wrote history?

Title count vs. paper consumption

Document count vs. paper for top authors

ggplot(df2, aes(x = docs, y = paper)) + geom_text(aes(label = author.unique), size = 4)

Who wrote history?

Gender distribution for authors over time. Note that the name-gender mappings change over time. This has not been taken into account yet.

## 
## female   male 
##  0.026  0.974

Who wrote history?

Other questions to explore

df2 <- df %>% filter(publication.place == "London")
df2 <- df %>% filter(language == "French")
df2 <- df %>% filter(publication.year >= 1700 & publication.year < 1800)
top_plot(df2, "author.unique", 10)

2. Where was history published ?

Top-10 places (number of titles)

top_plot(df, "publication.place", 10)

Where was history published ?

df2 <- df %>% filter(publication.country %in% c("France", "Germany")) %>%
    group_by(publication.decade, publication.country) %>%
    summarize(paper = sum(paper.consumption.km2, na.rm = TRUE), docs = n()) 
p <- ggplot(df2, aes(x = publication.decade, y = docs, color = publication.country)) +
     geom_point() + geom_smooth()
print(p)     

Where was history published ?

Title count vs. paper

publication.place paper docs
London 97.7011694 34928
Dublin 9.4571200 3293
Edinburgh 5.8437808 2444
Philadelphia Pa 1.5639500 1298
Boston 0.7236274 1098
Oxford 3.1651998 920
New York N.Y 0.7263559 724
unknown 0.2702561 489
Paris 1.6005532 274
Glasgow 0.9460195 257
York 0.4156092 203
Cambridge 0.8931177 179
Providence R.I 0.0243489 164
Amsterdam 0.4696873 160
Hartford Ct 0.1189740 145
Bristol 0.1602805 96
Newcastle 0.3798063 93
Norwich 0.1631005 93
Aberdeen 0.2579055 92
Boston Ma 0.1555083 86
Cork 0.1219026 86
Watertown Ma 0.0059476 86
Charleston S.C 0.1172916 80
Newport R.I 0.0259558 76
The Hague 0.2917593 74
New London Ct 0.0797421 69
Baltimore Md 0.0811340 66
Salem Ma 0.0205162 65
Exeter 0.4385841 63
Lancaster Pa 0.0153348 61
Bath 0.1683589 58
United States 0.0074780 56
Williamsburg Va 0.0249846 54
Annapolis Md 0.0284622 53
Norwich Ct 0.0205556 51
Birmingham 0.1860875 49
Shrewsbury 0.0425996 45
Manchester 0.1607316 44
Cambridge Ma 0.0512204 38
New Haven Ct 0.0179218 38
Portsmouth N.H 0.0121432 38
Salisbury 0.0755964 38
Litchfield Ct 0.0140826 34
Albany N.Y 0.0217126 33
Nottingham 0.0350965 32
Basel 0.4876664 31
Exeter N.H 0.0053984 31
Calcutta 0.5887621 30
Coventry 0.0159278 30
Quebec 0.0236815 30
Antwerp 0.1074526 29
Belfast 0.0249338 29
Newburyport Ma 0.0193370 27
Canterbury 0.1495364 26
Kilkenny 0.0139454 25
Liverpool 0.0861505 25
Worcester 0.0943631 25
Chester 0.0356998 23
Richmond 0.0141920 23
Fishkill N.Y 0.0025932 22
Perth 0.1905675 22
Worcester Ma 0.0263652 22
Burlington N.J 0.0726256 20
Waterford 0.0213785 19
Middelburg 0.0265354 18
New Haven Ma 0.0073169 18
Rotterdam 0.0405524 18
Whitehaven 0.0551667 18
Hamburg 0.0773574 17
Poughkeepsie N.Y 0.0051354 17
Hull 0.0933926 16
Kingston 0.0209270 16
New Orleans La 0.0026800 16
New Bern N.C 0.0056072 15
Savannah Ga 0.0025012 15
Sherborne 0.0586118 15
Wilmington De 0.0082981 15
York Pa 0.0010742 15
Halifax 0.0285274 14
Leeds 0.0176414 14
Limerick 0.0171982 14
Sheffield 0.0121611 14
St. Omer 0.0730371 14
Darlington 0.0235772 13
Eton 0.0246385 13
Ipswich 0.0351292 13
Rochester 0.0086605 13
Trenton 0.0027834 13
Germantown Pa 0.0271030 12
Reading 0.0309643 12
Delft 0.0272516 11
Leicester 0.0677198 11
Leiden 0.0265161 11
Paisley 0.0972585 11
Woodbridge N.J 0.0263863 11
Bridgetown Barbados 0.0090664 10
Colchester 0.0212154 10
King’s Lynn 0.0498319 10
Danvers Ma 0.0036897 9
Derby 0.0453600 9
Basseterre Saint Kitts 0.0065792 8
Bury St. Edmunds 0.1508538 8
Concord N.H 0.0135998 8
Frankfurt 0.0257214 8
Winchester 0.0068340 8
Bennington Vt 0.0089062 7
Carlisle 0.0245411 7
Carmarthen 0.0301766 7
Dundee 0.0083906 7
Halifax N.S 0.0045989 7
Newark N.J 0.0005083 7
Norfolk Va 0.0027697 7
Northampton 0.0014144 7
Southampton 0.0080863 7
Stamford 0.0178891 7
Bolton 0.0236569 6
Carlisle Pa 0.0067984 6
Chelmsford Ma 0.0003514 6
Douai 0.0617950 6
Hudson N.Y 0.0056962 6
Nassau 0.0024300 6
New Brunswick N.J 0.0064943 6
Newark 0.1561994 6
Plymouth 0.0025221 6
Preston 0.0045244 6
Stockbridge Ma 0.0024567 6
Warrington 0.0300346 6
Winchester Va 0.0047004 6
Augusta Ga 0.0014813 5
Berlin 0.0090402 5
Chelmsford 0.0362609 5
Cologne 0.0065898 5
Doncaster 0.0041142 5
Geneva 0.0249878 5
Glocester 0.0434457 5
Hanover N.H 0.0066176 5
Hereford 0.0046653 5
Kingston Jamaica 0.0006026 5
Knoxville Tn 0.0016772 5
Londonderry 0.0115085 5
Madras India 0.0040056 5
Montreal 0.0020798 5
Richmond Va 0.0013440 5
Rouen 0.0809901 5
St. John’s Antiqua 0.0038426 5
Trenton N.J 0.0179238 5
Tunbridge Wells 0.0015591 5
Utrecht 0.0148028 5
Venice 0.0197678 5
Wesel 0.0023959 5
Amherst N.H 0.0027788 4
Ayr 0.0131100 4
Brussels 0.0041844 4
Chambersburg Pa 0.0052640 4
Cirencester 0.0104671 4
Dort 0.0035728 4
Durham 0.0036127 4
Halifax N.C 0.0011423 4
Hanover 0.0737813 4
Kelso 0.0024082 4
Kingston N.Y 0.0000782 4
Lansingburgh N.Y 0.0052544 4
Leominster Ma 0.0020976 4
Lexington Ky 0.0034146 4
Louvain 0.0092770 4
Newburgh N.Y 0.0045238 4
Portland Me 0.0037987 4
Roseau Dominica 0.0033016 4
Rutland Vt 0.0118092 4
St. Andrews 0.0087112 4
St. George’s Grenada 0.0024300 4
Stirling 0.0025859 4
Walpole N.H 0.0155485 4
Windham Ct 0.0035844 4
Banbury 0.0013403 3
Bern 0.0162279 3
Bishopstone 0.0204759 3
Bury 0.0021867 3
Chesterfield 0.0034052 3
Clonmel 0.0024480 3
Copenhagen 0.0091520 3
Emden 0.0015067 3
Ephrata Pa 0.0012844 3
Falkirk 0.0031675 3
Gainsborough 0.0157019 3
Galway 0.0014490 3
Gouda 0.0064688 3
Hagerstown Md 0.0017366 3
Hertford 0.0015831 3
Jacksonburgh S.C 0.0003312 3
Keene N.H 0.0011445 3
Leipzig 0.0151905 3
Lyon 0.0075000 3
Macclesfield 0.0114114 3
Maidstone 0.0014928 3
Pointe-A-Pitre 0.0007392 3
St. Augustine Fl 0.0001966 3
St. Pierre Martinique 0.0018900 3
Strasbourg 0.0013338 3
Sunderland 0.0164409 3
Taunton 0.0105892 3
Tewkesbury 0.0065332 3
Twickenham 0.0195888 3
Warren R.I 0.0024895 3
Westminster Vt 0.0016556 3
Windsor Vt 0.0009867 3
Wolverhampton 0.0012776 3
Alexandria Va 0.0003148 2
America 0.0003452 2
Andover Ma 0.0007632 2
Bombay 0.0021685 2
Caen 0.0084756 2
Charlottetown Pe 0.0079274 2
Chatham N.J 0.0005814 2
Columbia S.C 0.0010180 2
Dover 0.0023200 2
Dover N.H 0.0006290 2
Downpatrick 0.0001966 2
Drogheda 0.0015034 2
Dumfries 0.0013927 2
Dumfries Va 0.0014448 2
Elizabeth N.J 0.0064836 2
Florence 0.0060072 2
Fredicksburg Va 0.0012732 2
Fryeburg Me 0.0015660 2
Gloucester 0.0110184 2
Gothenburg 0.0003956 2
Gravesend 0.0097328 2
Haarlem 0.0007248 2
Harrisburgh Pa 0.0028025 2
Haverhill N.H 0.0016245 2
Hillsborough N.C 0.0009450 2
Howden 0.0016321 2
Kilmarnock 0.0053922 2
La Rochelle 0.0028282 2
Lausanne 0.0056440 2
Lincoln 0.0000616 2
Ludlow 0.0019380 2
Mechelen 0.0064680 2
Montego Bay 0.0016200 2
Montrose 0.0280108 2
New Bedford Ma 0.0024402 2
New Windsor N.Y 0.0013646 2
Newbury Ma 0.0041888 2
Newbury Vt 0.0015660 2
Newry 0.0009088 2
Northampton Ma 0.0015352 2
Peterborough 0.0028158 2
Petersburg Va 0.0008478 2
Poole 0.0015808 2
Port-au-Prince 0.0008398 2
Portsmouth 0.0008100 2
Reading Pa 0.0031616 2
Regensburg 0.0017566 2
Schenectady N.Y 0.0021120 2
Shepherdstown Va 0.0049326 2
Springfield Ma 0.0006543 2
St. Ives 0.0004928 2
Staunton Va 0.0013044 2
Strabane 0.0103954 2
Sudbury 0.0075829 2
Vevey 0.0167960 2
Vienna 0.0121524 2
Washington D.C 0.0002268 2
Wilmington N.C 0.0010076 2
Windsor 0.0051870 2
Winterthur 0.0050635 2
Wisbech 0.0008742 2
Abbeville 0.0072556 1
Abingdon 0.0001235 1
Aldermanbury 0.0013585 1
Alnwick 0.0019712 1
Altmore 0.0001350 1
Altona 0.0019760 1
Ampthill 0.0000616 1
Augusta Me 0.0005700 1
Bath N.Y 0.0006804 1
Beverley 0.0015314 1
Birstall 0.0001425 1
Blackburn 0.0488700 1
Bottisham 0.0000616 1
Bouillon 0.0158728 1
Boulogne 0.0013832 1
Brattleborough Vt 0.0001296 1
Brentford 0.0005206 1
Bridgeton N.J 0.0007224 1
Bridgnorth 0.0001350 1
Brookfield Ma 0.0019475 1
Buckden 0.0001350 1
Bungay 0.0008398 1
Burlington Vt 0.0007830 1
Burnley 0.0274120 1
Burton 0.0001425 1
Campbeltown 0.0003696 1
Cap Haitien 0.0000000 1
Carlow 0.0003458 1
Cashel 0.0007830 1
Charlestown Ma 0.0002850 1
Charlottesville Va 0.0000000 1
Chatham 0.0001350 1
Chesire Ct 0.0000475 1
Cincinnati Oh 0.0006642 1
Concord Ma 0.0001000 1
Danbury Ct 0.0002850 1
Daventry 0.0005400 1
Dedham Ma 0.0063726 1
Deptford 0.0001350 1
Devizes 0.0000124 1
Dorchester 0.0008100 1
Dresden 0.0472031 1
Dunbar 0.0002850 1
East Molesey 0.0007904 1
Easton Md 0.0001092 1
Edenton N.C 0.0008100 1
Egham 0.0002470 1
Elizabethtown Md 0.0002964 1
Ennis 0.0007904 1
Europe 0.0002464 1
Evesham 0.0117040 1
Frederick Md 0.0000000 1
Fredericton 0.0005400 1
Gateshead 0.0004928 1
Gaunt 0.0000000 1
Gdansk 0.0009856 1
Geneva N.Y 0.0007830 1
Ghent 0.0126900 1
Glocester Ma 0.0001350 1
Goa 0.0001482 1
Grantham 0.0001425 1
Greenfield Ma 0.0019000 1
Greenock 0.0007904 1
Grenada 0.0033880 1
Guernesey 0.0006175 1
Halle 0.0105819 1
Harlow 0.0009880 1
Haverhill Ma 0.0009856 1
Heidelberg 0.0035728 1
Horncastle 0.0001350 1
Houghton Park 0.0000247 1
Inverlochie 0.0003696 1
Kendal 0.0005550 1
Kirkcudbright 0.0006160 1
Knaresborough 0.0005681 1
Koningsberg 0.0000616 1
Lancaster N.J 0.0000000 1
Leeuwarden 0.0032851 1
Lewes 0.0000000 1
Liege 0.0108834 1
Lille 0.0024640 1
Margate 0.0006160 1
Marlborough 0.0008100 1
Martinsburg Va 0.0001050 1
Medford Ma 0.0000000 1
Minorca 0.0001350 1
Monmouth 0.0012844 1
Monmouth N.J 0.0059280 1
Mount Holly N.J 0.0002850 1
New Bedford Ms 0.0000000 1
Newfield Ct 0.0007830 1
Newport Isle Wight 0.0084227 1
Newton N.J 0.0007224 1
Niagara 0.0001482 1
North Shields 0.0002850 1
Northallerton 0.0010868 1
Ossining N.Y 0.0007224 1
Paris Ky 0.0007656 1
Parr-Town 0.0001350 1
Pembroke Ma 0.0016200 1
Penrith 0.0000988 1
Raleigh N.C 0.0046788 1
Reims 0.0019760 1
Rochdale 0.0002850 1
Rodborough 0.0003952 1
Rome 0.0024453 1
Roscrea 0.0008100 1
Saarbrucken 0.0370366 1
Salamanca 0.0001350 1
Salem N.Y 0.0008100 1
Salisbury N.C 0.0008100 1
Savanna-la-Mar Jamaica 0.0008100 1
Scipio N.Y 0.0008100 1
Shelburne Nova Scotia 0.0002464 1
Shiffnal 0.0002850 1
Siena 0.0022325 1
Sligo 0.0004750 1
South Shields 0.0028619 1
Spanish Town Jamaica 0.0006804 1
St. Albans 0.0391500 1
St. Eustatius 0.0003024 1
St. Germans 0.0002700 1
St. Helier 0.0002464 1
St. Mary’s Md 0.0001350 1
Stafford 0.0002700 1
Stockton 0.0059136 1
Trefeca 0.0167082 1
Trevoux 0.0000950 1
Verdun 0.0000000 1
Vergennes Vt 0.0005928 1
Walsall 0.0010780 1
Waltham 0.0001350 1
Warwick 0.0011115 1
West Springfield Ma 0.0011856 1
Wexford 0.0006240 1
Weymouth 0.0011115 1
Wigan 0.0019594 1
Winton 0.0072556 1
Wokingham 0.0008100 1
Wrexham 0.0024640 1
Yarmouth 0.0005928 1
Yeovil 0.0008100 1
ggplot(df2,
     aes(x = log10(1 + docs), y = log10(1 + paper))) +
     geom_text(aes(label = publication.place), size = 3) +
     scale_x_log10() + scale_y_log10() 

Where was history published ?

Scotland, Ireland, US comparison:

df2 <- df %>%
    filter(!is.na(publication.country)) %>%
    group_by(publication.country) %>%
    summarize(paper = sum(paper.consumption.km2, na.rm = TRUE),
          docs = n()) %>%
    arrange(desc(docs)) %>%
    filter(publication.country %in% c("Scotland", "Ireland", "USA"))

Where was history published ?

p1 <- ggplot(subset(melt(df2), variable == "paper"), aes(y = value, x = publication.country)) + geom_bar(stat = "identity") + ylab("Paper consumption")
p2 <- ggplot(subset(melt(df2), variable == "docs"), aes(y = value, x = publication.country)) + geom_bar(stat = "identity") + ylab("Title count")
grid.arrange(p1, p2, nrow = 1)

3. How does the history publishing change in the early modern period ?

What can we say about the nature of the documents? Pamphlets (<32 pages) vs. Books (>120 pages) ? Book size statistics and development over time

Nature of the documents

Nature of the documents

Estimated paper consumption by document size

Nature of the documents

Document sizes over time

Serious statistical analysis (also in the Humanities)

Open science in (digital?) humanities

ioannidis

These slides are automatically generated as well

workflow

Barriers to open science in the humanities

ropengov

Thomason tracts 1640-1660

Gatherings and page counts

Page counts

Page count: distribution for documents with different sizes.

How does the history publishing change in the early modern period ?

Nature of the documents

Estimated title count by document size

Nature of the documents

Top authors

Nature of the documents

Top authors title count

ropengov

How does the history publishing change in the early modern period ?

How does the history publishing change in the early modern period ?

Top-4 places (title count), mean page count over time.